Maps of Asthma Rate and PM2.5 in the Bay Area

The first map shows the age-adjusted rate of emergency department visits for asthma per 10,000 people, averaged over 2015 to 2017, in the Bay Area. Asthma prevalence fluctuates at a rate of 50 to 150 in most of the Bay Area, with highest prevalence seen in Vallejio at more than 200, followed by Richmond and San Leandro.

The second map shows the annual mean concentration of PM2.5, averaged over 2015 to 2017, in the Bay Area. The PM2.5 concentration is lowest in the north and western parts of the Bay Area at about 2 to 6 micrograms/cubic meter. The highest levels of PM2.5 in the Bay Area are found in clusters, mostly in the middle of the Bay Area, such as Rancho Cucamonga and Bakersfield.

Normal Regression Model

The best-fit line does not look very representative as there are many clusters of points below and above the line. In particular, high asthma rates of 150 to 250 in areas with around 8 to 9 micrograms of PM2.5 lie far away from the line.

## 
## Call:
## lm(formula = Asthma ~ PM2.5, data = ces4_clean2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -50.424 -21.485  -6.539  13.432 193.479 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  34.4917     1.6229   21.25   <2e-16 ***
## PM2.5         1.7228     0.1564   11.02   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 30.34 on 8022 degrees of freedom
## Multiple R-squared:  0.01491,    Adjusted R-squared:  0.01479 
## F-statistic: 121.4 on 1 and 8022 DF,  p-value: < 2.2e-16

An increase of 1.7228 micrograms/cubic meter in PM2.5 is associated with an increase of 1 visit to the asthma emergency department per 10,000 people. Variation in PM2.5 explains 1.48% of the variation in asthma.

The mean of the residual is close to zero, but there appears to be a skew to the left of the density curve of the residual, suggesting that the residuals are not normally distributed. This means that the errors made by the model are not consistent across variables and observations, i.e. the errors are not random.

Logarithmic Regression Model

Although there are still large clusters above and below the line, this model is better as the range of points above and below the best-fit line are quite similar.

## 
## Call:
## lm(formula = log(Asthma) ~ PM2.5, data = ces4_clean2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4046 -0.3767  0.0252  0.3826  1.7603 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.34395    0.03062  109.20   <2e-16 ***
## PM2.5        0.04387    0.00295   14.87   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5725 on 8022 degrees of freedom
## Multiple R-squared:  0.02682,    Adjusted R-squared:  0.0267 
## F-statistic: 221.1 on 1 and 8022 DF,  p-value: < 2.2e-16

An increase of e^0.04387 = 1.0448 micrograms/cubic meter in PM2.5 is associated with an increase of 1 visit to the asthma emergency department per 10,000 people. Variation in PM2.5 explains 2.67% of the variation in log(Asthma).

The distribution is more normal now - there is less skew, with about an even number of residuals on both sides of the density curve.

The census tract with the most negative residuals is 6037265301, at the University of California, Los Angeles (Los Angeles county), with a negative residual of -2.404633. A negative residual means that the regression line overestimated the number of asthma cases in UCLA for its level of PM2.5. It may be a result of distortion due to the age adjusted rate used to calculate asthma data as UCLA comprises a large population of students so other age groups who may be more vulnerable to serious asthmatic events are less represented. UCLA may also have good healthcare resources to assist asthmatic individuals before asthma events escalate into a more serious emergency.